Author Correspondence author
Cotton Genomics and Genetics, 2024, Vol. 15, No. 1 doi: 10.5376/cgg.2024.15.0002
Received: 10 Dec., 2023 Accepted: 12 Jan., 2024 Published: 25 Jan., 2024
Ding D.Y., 2024, The role of gwas in cotton fiber quality improvement, Cotton Genomics and Genetics, 15(1): 9-19 (doi: 10.5376/cgg.2024.15.0002)
This study summarizes the application of genome-wide association studies (GWAS) in improving cotton fiber quality and its potential contribution to the textile industry. Cotton, as an important raw material in the global textile industry, its fiber quality directly affects the market value of products. In recent years, GWAS has been widely used as a powerful genetic tool to identify key genes that affect cotton fiber quality. The article first introduces the principle of GWAS and its importance in plant genetic improvement. Subsequently, the genetic basis of cotton fiber quality and the main achievements achieved through the GWAS method were explored. Although there are technical and methodological challenges, such as the complexity of data collection and the control of false positive results, these challenges can be effectively overcome by integrating multiple omics data and developing new statistical methods. Looking ahead, GWAS is expected to play a more important role in improving cotton quality, promoting the development of high-quality cotton varieties, and meeting the market's demand for high-quality textiles. This article emphasizes the importance of continuing to study GWAS in cotton improvement, which not only promotes the development of textile materials science, but also contributes to the progress of the global textile industry.
Cotton (Gossypium spp) is an important part of the global textile industry, and its fiber quality directly affects the market value and application scope of the product. In recent years, with the development of biotechnology, the role of genetics in improving crop quality has become increasingly prominent, especially in improving the quality of cotton fiber. Genome -wide association studies (GWAS), as a powerful genetic research tool, provide insights into the genetic mechanisms controlling cotton fiber quality by analyzing the association between thousands of genetic variants and specific traits. new ways.
The success of GWAS relies on the availability of large amounts of biological samples, high-throughput genotype identification technology, and sophisticated bioinformatics analysis methods. The foundation of this research method is based on the establishment of biobanks, the results of the International Human Genome Haplotype Map Project, and the development of high-density genotyping chips (Marees et al., 2018). By comparing frequency differences in millions of single nucleotide polymorphisms (SNPs) between disease groups and healthy controls, GWAS can identify genetic variants associated with specific traits. In addition, GWAS not only focuses on the effects of individual SNPs, but also explores the contribution of complex interactions (i.e., phenotypes) between SNPs to complex diseases.
When conducting GWAS, researchers usually face the challenge of large amounts of data, requiring complex analysis through statistical software. For example, software tools such as PLINK, PRSice, and R are widely used in GWAS analysis to help researchers identify genetic variants associated with specific traits and calculate polygenic risk scores (PRS) that combine multiple genetic variants. (Marees et al., 2018). These analyzes not only enhance the understanding of genetic risk but also provide a basis for further functional studies and the development of therapeutic strategies.
Improvement of cotton fiber quality through the application of GWAS significantly benefits from the richness and depth of genetic information that this technology can reveal. Through correlation analysis, researchers can identify key genetic factors that affect important quality characteristics such as cotton fiber length, strength, and fineness. These results not only enrich our understanding of the genetic basis of cotton fiber development, but also provide the possibility to improve cotton varieties through molecular breeding technology.
1 Basic Knowledge of GWAS
1.1 Introduction to the principles and methods of GWAS
Genome-wide association studies (GWAS) are a method used to study the association between genetic variants and specific traits, especially in large sample data. GWAS typically involves targeted genotyping of specific and pre-selected variants using microarray technology, while whole-exome sequencing (WES) and whole-genome sequencing (WGS) aim to capture all genetic variation. The goal of these studies is to identify genetic variants that are statistically associated with a specific trait or disease (Emil et al., 2021).
When conducting a GWAS, you first need to select an appropriate study population, which often requires a large sample size to identify reproducible, genome-wide significant associations. The sample size can be determined by power calculation using software tools such as CaTS or GPC. Study designs can include cases and controls if the trait is dichotomous, or quantitative measurements of the entire study sample if the trait is quantitative. Genotyping is typically performed using microarrays to target common variants, or to include rare variants using next-generation sequencing methods such as WES or WGS. Microarray genotyping is the most common method for performing GWAS to obtain genotypes because the current cost of next-generation sequencing is relatively high. Ideally, WGS is the method of choice because of its ability to identify virtually every genotype across the entire genome, and is expected to become the method of choice in the coming years as low-cost WGS technology becomes more widely available (Emil et al., 2021).
Conducting GWAS also requires careful consideration and processing of the data to avoid false positive or false negative genetic signals and biased test statistics due to issues such as racial stratification. For example, not accounting for ancestral background matching of cases and controls may lead to confounding, such as in a GWAS on chopstick usage habits if cases are defined as people who "frequently use chopsticks" and controls as "non-users of chopsticks" people, then the cases may be more likely to come from East Asian populations. Disregarding ancestral background in this study would identify variants that are more common in East Asian populations than other populations, such as specific human leukocyte antigen (HLA) alleles, not because these variants contribute to dexterity, but Because cultural practices act as confounding variables here.
An important benefit of GWAS is that it has advanced understanding of the genetic basis of many complex traits and diseases by identifying the genetic loci associated with them. For example, GWAS has successfully revealed genetic loci associated with a variety of diseases and traits, including diabetes, coronary artery disease, schizophrenia, and height. However, GWAS also faces some limitations, such as insufficient coverage of rare variants, limitations in statistical power, and the genetic construction of complex traits may be more complex than GWAS can explain (Tam et al., 2019).
GWAS typically employs microarray technology for targeted genotyping of specific pre-selected variants, whereas whole-exome sequencing (WES) and whole-genome sequencing (WGS) aim to capture all genetic variants. These studies have focused on common variants, but as WGS and WES technologies mature, analysis protocols may need to be expanded to cover specific issues that arise when analyzing rare variants, such as when controlling for population structure or inferring missing genotypes.
Performing GWAS requires a large number of samples to identify reproducible genome-wide significant associations. Determination of sample size can be performed through power calculation software such as CaTS or GPC. Depending on the trait, the study design may include cases versus controls or quantitative measurements of the entire sample. The choice of study population, data resources, and study design depends on the required sample size, the experimental question, and the availability of existing data or the ease of collecting new data. Typically, most GWAS are performed using existing resources, and when more refined phenotypes are required, new data usually need to be collected.
Genotyping is typically performed using microarrays for common variants, or using next-generation sequencing methods such as WES or WGS to include rare variants. Currently, microarray genotyping is the most commonly used method to obtain genotypes in GWAS due to its relatively low cost. However, as the availability of low-cost WGS technology increases, it is expected that WGS will become the method of choice in the next few years.
1.2 Discuss the advantages and limitations of GWAS in plant genetics research
Genome-wide association studies (GWAS) are a method used to study the relationship between genetic variations and specific traits or diseases. This approach typically involves targeted genotyping of specific and pre-selected variants using microarray technology, but can also capture all genetic variation via whole-exome sequencing (WES) and whole-genome sequencing (WGS). GWAS often require large samples to identify reproducible, genome-wide significant associations, and study designs can be based on case-control groups or allow for quantitative measurements across the entire sample population. Genotyping of individuals is typically performed using microarrays due to the current relatively high cost of next-generation sequencing. However, the processing of genotyping data requires strict quality control, including steps such as removing rare or monotypic variants, filtering missing SNPs, and identifying and removing genotyping errors.
A common approach to GWAS is a case-control study, which compares two large groups of individuals, a healthy control group and a group of cases affected by a specific disease. Each individual is typically genotyped at known common SNPs. The study examined whether the allele frequency of each SNP was significantly different between cases and controls. The basic unit of this method is the rate ratio, which is the ratio of the proportion of a particular allele in a group of cases to the proportion in controls. In addition, the P value of the ratio ratio is calculated to evaluate its statistical significance (Ma et al., 2018). Although this method tests a large number of variant sites, it usually requires a P value lower than 5×10^(-8) before the variant site is considered significant.
Although GWAS has been highly successful in identifying the genetic basis of complex diseases or traits, there are some limitations. For example, GWAS often focus on studying common genetic variants, which may lead to overlooking the role of rare variants. In addition, most genetic variants discovered by GWAS contribute only modestly to disease risk, which limits the direct use of these results in clinical applications. Another challenge is that interpretation of GWAS results often requires further functional studies to determine how specific genetic variants affect a disease or trait. In addition, because large-scale samples are required to conduct GWAS, this requires a huge investment of time and money. Furthermore, researchers need to consider factors such as population stratification when analyzing data to avoid false positive results.
GWAS is a powerful tool that can reveal associations between genetic variants and complex diseases or traits, but it also presents several methodological and interpretive challenges. Future research is needed to overcome these limitations to better exploit GWAS findings, advance understanding of the genetic basis of complex diseases, and advance the application of these findings in healthcare.
1.3 Overview of the general application of GWAS in crop quality improvement
Genome-wide association studies (GWAS) have evolved into a powerful and ubiquitous tool for exploring complex traits. The development of this method has been facilitated by advances in genomic technologies that allow us to examine genome-wide genetic variation in diverse genetic material. The development of a mixed model framework for GWAS significantly reduces the number of false positives compared to naive methods. Furthermore, building on this foundation, many methods have been developed to increase computational speed or improve the statistical power of GWAS. These methods allow the detection of genomic variants associated with traditional agricultural traits or biochemical and molecular traits. These associations can be helpful in accelerating crop breeding processes by applying gene cloning through marker-assisted selection or genetic engineering (Laura et al., 2021).
In crop quality improvement, GWAS has been applied to explore genetic variation associated with important agricultural traits, which helps understand the genetic basis of crop quality and guide breeding programs. By utilizing the variants identified by GWAS, researchers can discover key genes and pathways related to crop quality, thereby making progress in improving crop quality and adaptability. However, as in human genetics research, translating GWAS findings into practical applications still requires significant work, including validating the function of the genetic variants discovered and developing breeding strategies that can effectively exploit these findings.
Genome-wide association studies (GWAS) are powerful tools for identifying associations between crop traits, such as yield, quality, and stress resistance, and specific genetic markers. This approach mainly utilizes single nucleotide polymorphism (SNP) markers for analysis, as next-generation sequencing (NGS) technology can provide genome-wide marker data in a cost- and time-efficient manner (Muhammad et al., 2022). In recent years, researchers have increasingly used haplotype-based GWAS analysis because haplotype blocks have higher mapping accuracy and power compared to individual SNPs, helping to detect quantitative trait loci (QTLs))/Gene. Haplotypes provide more information than a single SNP (biallelic) based on their multi-allelic nature. For example, haplotype analysis can capture epistatic interactions between sites, provide more information to estimate whether two alleles are homologous alleles (IBD), and can reduce the number of first tests by reducing the number of tests. class error rate, capture information from evolutionary history, and analyze the family of alleles present at a specific locus more efficiently than a single marker system.
In multiple important crop species, haplotype-based GWAS analysis has successfully identified important QTLs and candidate genes related to traits such as yield, quality, and stress resistance. These crops include Arabidopsis thaliana, soybean, wheat, barley, rice and corn, among others. Haplotype analysis showed greater power than SNP-based GWAS in detecting genetic loci associated with plant height and biomass (Bhat et al., 2021). When haplotype analysis identified significant associations and candidate genes for drought tolerance in maize, although the number of significant associations and candidate genes detected was smaller, the phenotypic variation explained (PVE) value was higher.
In a specific application case, GWAS studies on spring barley and winter wheat aimed to identify genetic loci related to grain yield, quality and disease resistance through single-trait and multi-trait GWAS. In addition, the study included genotype and site information to evaluate marker effects on grain yield performance of winter wheat at different experimental sites. Through GWAS analysis, SNPs related to traits such as grain yield, quality, and disease resistance were identified, and genomic prediction models for traits of interest were developed, such as rrBLUP and Bayesian Power Lasso models (Figure 1) (Bhat et al., 2021).
Figure 1 Functional haplotype-GWAS (FH-GWAS) analysis for identifying superior haplotypes for traits of interest (Bhat et al., 2021) |
These studies not only demonstrate the practical application of GWAS in crop quality improvement, but also highlight the potential of haplotype- and SNP-based GWAS methods in enhancing gene identification accuracy and promoting gene-assisted breeding (GAB). In this way, GWAS provides a powerful tool for crop improvement that can effectively identify and exploit genetic variants associated with key agronomic traits, thereby supporting the improvement of crop quality and enhancing crop adaptability (Tibbs et al., 2021).
2 The Genetic Basis of Cotton Fiber Quality
2.1 Introduction to the formation and development process of cotton fiber
The formation and development of cotton fiber is a complex biological process, which is not only controlled by genetic factors, but also affected by environmental factors. This process can be roughly divided into four stages: initial stage, elongation stage, secondary wall thickening stage and maturity stage.
The development of cotton fibers begins on the ovule surface of the flower boll. Fiber initiation is a single cell produced by cell division on the surface of cotton ovules after flowering. These primary fiber cells elongate rapidly and begin to form visible fibers ; after initiation, cotton fibers enter a period of rapid elongation. During this period, fiber cells rapidly elongate through cell wall stretching and increased intracellular pressure. The main component of the cell is cellulose, and its structure and arrangement determine the strength and elongation performance of the fiber ; after the elongation period, the fiber cells begin to form secondary walls. At this stage, a large amount of cellulose accumulates inside the fiber and the cell wall becomes thicker, which is a key stage for the formation of fiber quality (Ma et al., 2018). The formation of the secondary wall is crucial to the strength and elasticity of the fiber ; after the secondary wall is completely formed, the cytoplasm in the fiber cells begins to degrade, the cells gradually lose their vitality, and eventually the fiber matures. Mature cotton fibers are hollow, have high strength and good natural luster.
Cotton fiber qualities, including length, strength, fineness and maturity, are gradually formed during this series of developmental stages. Genetic studies have revealed multiple genes and loci that control these traits, providing the possibility to improve cotton fiber quality through molecular breeding.
2.2 Discuss the main genetic factors affecting cotton fiber quality
The genetic factors that affect cotton fiber quality are multifaceted and involve many different genes and genetic pathways. The quality of cotton fiber can be measured through multiple dimensions, including length, strength, fineness (micron level), maturity and uniformity. Each quality characteristic is controlled by multiple genes, which interact with each other during cotton growth and development to affect final fiber quality.
Fiber length is an extremely important quality indicator in the textile industry. Fiber length is controlled by multiple genes, which affect the elongation rate and duration of fiber cells ; fiber strength is one of the key factors that determine cotton's processing ability. Fiber strength is affected by fiber cell wall thickness and composition, properties controlled by specific groups of genes ; fiber fineness or diameter affects the texture and appearance of textiles. The genetic regulation mechanism of fiber fineness involves genes that control cell division and elongation; fiber maturity affects cotton processability and finished product quality (Liu et al., 2018). Immature fibers can lead to reduced textile strength and processing interruptions. The genetic basis of maturity is related to the biosynthesis and deposition of fiber cell walls; the uniformity of fiber quality determines the overall quality of cotton textiles , and the genetic control of uniformity involves genes that regulate the consistency of fiber development.
Studying the genetic basis of cotton fiber quality often involves the use of molecular markers and genomic selection techniques to identify genes or gene segments associated with specific fiber quality traits. For example, methods such as genome-wide association studies (GWAS) and quantitative trait loci (QTL) mapping have been used to identify genetic variations associated with fiber quality. These studies provide genetic information and molecular tools for improving cotton varieties to improve fiber quality.
2.3 Introduce the known major genes and loci that control cotton fiber quality
Cotton fiber quality is a complex trait controlled by multiple genes that affect fiber length, strength, fineness and other important quality parameters. In recent years, with the development of molecular biology technology, especially the application of genome-wide association studies (GWAS), scientists have identified multiple important genes and loci that control cotton fiber quality. For example, the gene GhMYB25 was found to play a key role in regulating the early development of cotton fiber cells, while GhDET2 affects the fiber elongation stage. In addition, a series of genes related to fiber strength, such as GhRDL1 and GhSuS, were also identified, which act at different stages of fiber development, thus affecting the final fiber quality.
For example, this study used the latest third-generation sequencing technology, BioNano optical mapping technology and Hi-C chromatin structure capture technology to conduct high-precision assembly of the genomes of Gossypium hirsutum and Gossypium barbadense. Compared with the previous draft genome, the newly assembled Gossypium hirsutum and Gossypium barbadense genomes have achieved significant improvements in continuity and completeness (55-fold and 90-fold improvements, respectively), especially in highly repetitive genome regions. For example, a breakthrough was made in the assembly of centromeres. Through comparative analysis of the genomes of these two tetraploid cotton species, the study revealed a large number of structural variations, which mainly occurred after the cotton genome experienced allopolyploidization events. It is worth noting that the study found that 14 pairs of chromosomes out of 26 pairs of chromosomes exhibited large segment inversions within and between chromosome arms, which may explain the difficulties encountered in the previous cotton genome draft assembly (Wang et al., 2020).
In addition, through genome analysis of genetically introduced line materials between upland cotton and sea-island cotton, the study successfully identified 13 key genetic loci that control fiber quality, and further explored these loci by analyzing transcriptome data of fiber development. expression control mechanism. This work provides an important reference for genetically improving upland cotton using the excellent fiber quality of Gossypium barbadense, and lays the foundation for the long-term goal pursued by cotton breeders - improving the fiber quality of upland cotton through interspecific introgression breeding methods (Wang et al., 2020).
3 Application of GWAS in Cotton Fiber Quality Improvement
3.1 Review of key studies using GWAS methods to identify genes and loci related to cotton fiber quality
Over the past two decades, the use of high-density SNP arrays and DNA sequencing technologies has revealed much of the genotype space in a variety of crops, including cotton. Genome-wide association studies (GWAS) link phenotypes to their underlying genetics across population genomes. This technology was first developed and applied in the field of human disease genetics and has been widely incorporated into crop research in the past decade. Cotton is an important cash crop accounting for about 35% of global fiber consumption, mainly provided by upland cotton (Gossypium hirsutum), which has wide adaptability and high yield. Since cotton fiber traits are a key component of the global textile industry, they have attracted more attention from researchers than other traits (Liu et al., 2020).
In a GWAS on fiber quality traits in upland cotton (Gossypium hirsutum L.), by combining 10 660 high-quality SNP markers and phenotypic data from multiple environments, a total of 42 genes were detected that were significantly related to five fiber quality traits. related SNPs. These SNPs are distributed on 13 chromosomes, including At05, At09, At10, At12, Dt01, Dt02, Dt05, Dt06, Dt08, Dt09, Dt10, Dt11 and Dt12. In addition, the study also identified 31 QTL distributed on different chromosomes containing significant SNPs, six of which were newly discovered QTL and the rest overlapped with previously identified QTL. This study deepens our understanding of the genetic mechanisms controlling cotton fiber quality traits through the identification of SNPs and QTL (Muhammad et al., 2022).
3.2 Analysis of genetic control of cotton fiber quality
These GWAS studies provide basic knowledge for cotton fiber quality improvement, pointing out that fiber quality traits are affected by genotype and environment, and are mainly controlled by genetic effects. By identifying SNPs and QTLs associated with fiber quality traits, researchers can more precisely locate genes that have a positive impact on cotton fiber quality, thus providing valuable genetic resources for molecular breeding. These results highlight the potential of GWAS to reveal crop genetic diversity and improve crop traits.
In the improvement of cotton fiber quality, the application of genome-wide association studies (GWAS) has significantly improved our understanding of the genetic control of cotton fiber quality. Through research, it has been found that the genetic control of cotton fiber quality is complex and affected by multiple factors, including genotype, environment, and the interaction between genotype and environment. A GWAS study on upland cotton (Gossypium hirsutum L.) revealed 42 significantly correlated units affecting five main traits of fiber quality (fiber extensibility, fiber fineness, fiber strength, fiber length and fiber uniformity) . Nucleotide polymorphisms (SNPs) , these SNPs are distributed on the 13 chromosomes. In addition, the study also identified 31 quantitative trait loci (QTLs) , including newly discovered and previously known QTLs (Su et al., 2016). These findings suggest that fiber quality traits may be controlled by multiple genetic loci that are widely distributed in the cotton genome.
Further gene expression analysis revealed that 822 genes within these QTL regions had different expression patterns at different stages of fiber development, suggesting that these genes play important roles in fiber development. In particular, some genes are highly expressed during the fiber elongation phase, while others are preferentially expressed during the fiber secondary cell wall synthesis phase. This analysis provides important clues for further research into the genetic control of cotton fiber quality.
In another GWAS study on Gossypium barbadense , researchers analyzed phenotypic variation and genomic data of 279 Gossypium barbadense varieties and identified 102 quantitative trait nuclei that were significantly associated with five fiber quality traits. nucleotides (QTNs), these QTNs cover 26 chromosomes. Through further analysis, they identified 34 stable QTLs, including some high-frequency related QTLs that were detected multiple times in different environments, indicating that they have a stable impact on fiber quality. This study also highlights the impact of environmental conditions on the phenotypic distribution of fiber quality traits, as well as the overlap between QTLs and the identification of candidate genes under specific environmental conditions (Liu et al., 2015).
These research examples demonstrate the powerful ability of GWAS in revealing the genetic control mechanism of cotton fiber quality. They not only identify key genes and genetic loci that affect fiber quality, but also provide information on how these genetic factors affect fiber quality under different environmental conditions. insights. This has important implications for designing more effective cotton breeding strategies and improving fiber quality.
The application of genome-wide association studies (GWAS) in cotton fiber quality improvement. Our understanding of the genetic control of cotton fiber quality. Through GWAS, scientists are able to link phenotypes to their underlying genetics, which is critical for deciphering the genotypic space of various crops, including cotton. GWAS are particularly useful for revealing the natural genetic variation underlying complex traits, which is critical for the successful development of modern crop breeding programs.
Upland cotton (Gossypium hirsutum L.) revealed multiple significant single nucleotide polymorphisms (SNPs) that affect fiber quality . These SNPs are spread across multiple chromosomes, further identifying quantitative trait loci. (QTLs). This finding indicates that fiber quality is controlled by multiple genetic loci, which are widely distributed in the cotton genome (Ma et al., 2018). Gene expression analysis also revealed that genes within these QTL regions have different expression patterns at different stages of fiber development, emphasizing their important role in fiber development.
In another study, scientists identified quantitative trait nucleotides (QTNs) significantly associated with fiber quality traits by analyzing phenotypic variation and genomic data of Gossypium barbadense and defined stable QTLs . This study specifically points out how environmental conditions affect the phenotypic distribution of fiber quality traits and how candidate genes can be identified under specific environmental conditions (Muhammad et al., 2022).
These findings emphasize the importance of GWAS in the study of genetic control of cotton fiber quality. By identifying key genes and genetic loci, GWAS provides valuable information for cotton breeding, helping to design more effective strategies to improve cotton fiber quality. These developments not only help scientists gain a deeper understanding of the genetic basis of cotton fiber quality, but also provide new ideas for future research directions and cotton breeding practices.
These research examples demonstrate the powerful ability of GWAS in revealing the genetic control mechanism of cotton fiber quality. They not only identify key genes and genetic loci that affect fiber quality, but also provide information on how these genetic factors affect fiber quality under different environmental conditions. insights. This has important implications for designing more effective cotton breeding strategies and improving fiber quality.
3.3 Give examples of how GWAS research has promoted the improvement of cotton varieties
In the improvement of cotton fiber quality, genome-wide association studies (GWAS) have shown its significant contribution, especially in identifying genetic loci and genes related to fiber quality. By combining genetics and modern bioinformatics methods, researchers have been able to significantly advance the improvement of cotton varieties, with concrete examples showing the practical application of these technologies.
A study discovered four pleiotropic regions related to both cotton fiber yield and quality , and identified 14 high-quality germplasm resources containing these pleiotropic regions through GWAS . These high-quality germplasm resources are mainly obtained through mutation breeding and distant hybridization, showing higher yields and better fiber quality. This discovery breaks through the traditional concept that yield and quality have long been considered contradictory, and provides a new direction for future cotton breeding.
Another GWAS study focused on five cotton fiber quality characteristics, using the CottonSNP63K chip to identify 10 660 high-quality single nucleotide polymorphism (SNP) markers combined with phenotypic data collected in multiple environments. The study successfully detected 42 SNPs significantly associated with these five fiber quality characteristics, and these SNPs were distributed on 13 chromosomes. In addition, through the identification of quantitative trait loci (QTL) and pleiotropic analysis, the study further revealed the mechanism by which these fiber characteristics may be controlled by a pleiotropic QTL network. This work not only increases the understanding of the genomic basis of cotton fiber development, but also provides important genetic resources for future breeding (Figure 2) (Wang et al., 2021).
Figure 2 Phylogenetic relationships of 316 cotton accessions (Wang et al., 2021) Note: a: A neighbor-joining tree was constructed using whole-genome SNP data. The accessions were divided into three groups, group-1 (red), group-2 (cyan) and group-3 (blue); b: Population structure of cotton accessions. The cotton samples were divided into three groups when k = 3; c: Geographic origin of the three groups, Central Asia (CA), the United States (US), the Yellow River (YR), the Yangtze River (YZR) and other places (OTH); d: Phenotype distributions of yield and fiber quality traits, the group divided by the structure of the 316 accessions, Boll weight (BW), Seed index (SI), Lint PC (LP), Fiber length (FL), Fiber strength (FS) and Flowering data (FD) (Wang et al., 2021). |
These research examples show that GWAS can not only effectively identify genetic factors related to cotton fiber quality, but also provide the possibility to simultaneously improve cotton yield and fiber quality by discovering pleiotropic loci and high-quality germplasm resources. This is a significant development for the global textile industry, which relies on high-quality cotton fibre. Through such technological progress, cotton breeding can not only be more efficient, but also improve specific quality targets more accurately, ultimately leading to better quality cotton varieties.
4 Improvement of Cotton Fiber Quality
4.1 Technical and methodological challenges faced by applying GWAS in cotton fiber quality improvement
The application of genome-wide association studies (GWAS) in cotton fiber quality improvement faces multiple technical and methodological challenges. First of all, cotton fiber quality is a complex trait that is affected by multiple genes and environmental factors. Research shows that quality traits such as cotton fiber elongation, micronaire, strength, length and uniformity show rich diversity in upland cotton germplasm resources, and there is a significant correlation between these traits, while also Affected by both genotype and environment. This finding emphasizes that when applying GWAS to study cotton fiber quality improvement, the complexity of traits and the nature of polygenic control must be taken into account (He et al., 2021).
In addition, cotton, as an important industrial crop, has a complex genetic background, including multiple types of cotton, such as upland cotton and sea island cotton. These different types of cotton differ in their genetic makeup and fiber quality. For example, through research on cotton allotetraploid and other types of cotton, it was discovered that there are extensive genome structural variations during the polyploidization and domestication processes of cotton, which provides a theoretical basis and basis for understanding the genetic diversity and breeding of cotton. This complex genetic background of germplasm resources (Li et al., 2023,) brings challenges to the implementation of GWAS, especially in integrating and analyzing large-scale genomic data to reveal genetic variation related to fiber quality.
The research team also faced data availability and quality issues. Although important progress has been made in cotton genomics research in recent years, such as new progress in cotton genome variation and genetic research on fiber quality and yield, large-scale genome resequencing has identified a series of genes that are involved in the processes of natural selection and artificial breeding. Accumulated key genetic variants (Chen et al., 2022). However, high-quality genome sequences, comprehensive phenotypic data, and precise environmental information are crucial to perform effective GWAS, and the collection and curation of these data require significant investment of time and resources.
Therefore, although GWAS shows great potential in revealing the genetic basis of cotton fiber quality improvement, its implementation faces multiple challenges from cotton genetic diversity, data quality, and analysis methods. Future research is needed to develop more efficient genome analysis tools, integrate multi-omics data, and adopt innovative statistical methods to overcome these challenges and thus play a greater role in cotton breeding and genetic improvement.
4.2 How to overcome these challenges
When exploring how to overcome the challenges faced by genome-wide association studies (GWAS) in improving cotton fiber quality, the study pointed out that GWAS technology can effectively reveal genetic variations related to cotton fiber quality. By conducting high-density SNP (single nucleotide polymorphism) arrays and DNA sequencing of cotton varieties, scientists have been able to cover much of the cotton crop's genotype space. GWAS technology helps identify important genetic loci that control cotton fiber quality by correlating phenotypes and genotypes (Rice et al., 2020).
For example, a GWAS study on fiber quality characteristics of upland cotton (Gossypium hirsutum L.) found multiple significant single nucleotide polymorphisms (SNPs) and quantitative trait loci (QTLs) related to fiber quality . The loci are distributed on different chromosomes in the cotton genome. In addition, research also revealed the pleiotropy of some QTLs, that is, a single QTL can affect multiple fiber quality characteristics (Liu et al., 2020).
Utilizing high-density SNP chips and advanced sequencing technology to increase the resolution of genotype data can help more accurately locate genetic variations related to fiber quality. Consider the interaction between environmental factors and genotype to gain a more complete understanding of the genetic basis of cotton fiber quality. Cotton fiber quality is not only affected by genetic factors, but also significantly affected by environmental factors.
Functionally verify the candidate genes and QTL identified in the GWAS study to understand their specific roles in cotton fiber development. Integrating GWAS results with multi-omics data such as transcriptome and proteome can help to gain a deeper understanding of how genetic variation affects the expression and regulation of fiber quality-related traits. In order to overcome issues such as sample size and multiple testing, the development and application of new statistical analysis methods is crucial to improve the efficiency and accuracy of GWAS. Through the implementation of these strategies, scientists are expected to use GWAS technology more effectively to further improve cotton fiber quality and promote the progress of cotton breeding.
5 Conclusion
In the field of cotton fiber quality improvement, although genome-wide association studies (GWAS) have shown great potential, they also face a series of technical and methodological challenges. First, GWAS requires a large amount of genetic variation data and high-precision phenotypic data, which is particularly complex and costly to collect and analyze in cotton. In addition, due to the complex genetic background of cotton, common false positive results in GWAS need to be controlled through sophisticated statistical methods. These challenges require researchers to continue to innovate to improve the accuracy and efficiency of GWAS.
To overcome these challenges, researchers can adopt a variety of strategies. One approach is to improve the explanatory power and prediction accuracy of GWAS by integrating multi-omics data, such as transcriptomics, proteomics, and metabolomics data. This integrative approach helps reveal complex interactions between genes, phenotypes and environment. It is also critical to develop new statistical methods that can better handle large-scale data sets, reduce false positive rates, and increase the reliability of results. For example, the use of machine learning and artificial intelligence technology can greatly improve the efficiency and accuracy of analysis (Su and Ma, 2018).
The role of GWAS in improving cotton fiber quality is expected to become even more important. As genome sequencing technology advances and costs decrease, more precise maps of genetic variation will become available for GWAS, improving the ability to identify relevant genes. In addition, improvements to GWAS methods, including more efficient sample integration and data analysis techniques, will further enhance their application potential in cotton genetic improvement (Chen et al., 2022). The development of GWAS will help pinpoint the key genes that control cotton fiber quality, promote the development of new varieties, and meet the market demand for high-quality cotton.
Although there are many challenges faced when applying GWAS to improve cotton fiber quality, these challenges can be overcome through innovation in technology and methodology. So far, GWAS has made remarkable achievements in improving the quality of cotton fiber. In the future, it will continue to play an important role and contribute to the development of the global textile industry. Continuing to explore and study the application of GWAS in cotton improvement will not only be of great significance to improving cotton quality, but will also promote progress in the fields of textile materials science and crop genetic improvement.
Bhat J.A., Yu D., Bohra A., Showkat A.G., and Rajeev K., 2021, Varshney Features and applications of haplotypes in crop breeding, Commun Biol., 1266 (2021).
https://doi.org/10.1038/s42003-021-02782-y
Chen Y., Gao Y., and Chen P., 2022, Genome-wide association study reveals novel quantitative trait loci and candidate genes of lint percentage in upland cotton based on the CottonSNP80K array, Theoretical and Applied Genetics, 135(7): 2279-2295.
https://doi.org/10.1007/s00122-022-04111-1
Emil U., Huang Q.Q., Munung N.S., Jantina de V., Okada Y.K., Alicia R.M., Hilary C.M., Tuuli L., and Danielle P., 2021, Nature Reviews Methods Primers, Genome-wide association studies, 59 (2021).
He S., Sun G., and Geng X., 2021, The genomic basis of geographic differentiation and fiber improvement in cultivated cotton, Nat. Genet., 53: 916-924.
https://doi.org/10.1038/s41588-021-00844-9
Laura T.C., Zhang Z.W., and Jianming Yu J.M., Status and prospects of genome-wide association studies in plants, The Plant Genome, 14(1): e20077.
Li Y.Q., Zhao T., Fang L., and Zhang T.Z., 2023, Structural variation (SV)-based pan-genome and GWAS reveal the impacts of SVs on the speciation and diversification of allotetraploid cottons, 16: 678-693.
https://doi.org/10.1016/j.molp.2023.02.004
Liu R., Gao J.W., Xiao X.H., Zhang Z., Li J.W., and Liu A.L., 2018, GWAS analysis and QTL identification of fiber quality traits and yield components in upland cotton using enriched high-density SNP markers, Frontiers in Plant Science, 9: 392690.
https://doi.org/10.3389/fpls.2018.01067
Liu W., Song C., and Ren Z., 2020, Genome-wide association study reveals the genetic basis of fiber quality traits in upland cotton (Gossypium hirsutum L.), BMC Plant Biol., 20: 395.
https://doi.org/10.1186/s12870-020-02611-0
Liu W., Song C.X., Ren Z.Y., Zhang Z.Q., Pei X.Y., Liu Y.G., He K.L., Zhang F., Zhao J.J., Wang X.X., Yang D.,G., and Li W., 2020, Genome-wide association study reveals the genetic basis of fiber quality traits in upland cotton (Gossypium hirsutum L.), BMC Plant Biol, 20: 395.
https://doi.org/10.1186/s12870-020-02611-0
Liu X., Zhao B., Zheng H.J., Hu Y., Lu G., and Yang C.Q., 2015, Gossypium barbadense genome sequence provides insight into the evolution of extra-long staple fiber and specialized metabolites, Sci. Rep., 5:14139.
https://doi.org/10.1038/srep14139
Ma Z., He S., Wang X., Sun J., Zhang Y., and Zhang G., 2018, Resequencing a core collection of upland cotton identifies genomic variation and loci influencing fiber quality and yield, Nat. Genet., 50: 803-813.
https://doi.org/10.1038/s41588-018-0119-7
Marees A.T,. de Kluiver H., Stringer S., Vorspan F., Curis E., Marie-Claire C., and Derks E.M., 2018, A tutorial on conducting genome-wide association studies: Quality control and statistical analysis, Int. J. Methods Psychiatr Res., 27(2): e1608.
https://doi.org/10.1002/mpr.1608
Muhammad Y., Hafiza H.K., Quaid H., Muhammad W.R., Muhammad S., Junkang R., and Jiang Y.R., 2022, Status and prospects of genome-wide association studies in cotton, 13: 21.
https://doi.org/10.3389/fpls.2022.1019347
Rice B.R., Fernandes S.B., and Lipka A.E., 2020, Multi-trait genome-wide association studies reveal loci associated with maize inflorescence and leaf architecture, Plant Cell Physiol, 61, 1427-1437.
https://doi.org/10.1093/pcp/pcaa039
Su J., and Ma Q., 2018, Multi-locus genome-wide association studies of fiber-quality related traits in Chinese early-maturity upland cotton, Frontiers in Plant Science, 9: 407810.
https://doi.org/10.3389/fpls.2018.01169
Su J., and Yu S., 2016, Detection of favorable QTL alleles and candidate genes for lint percentage by GWAS in Chinese upland cotton, Frontiers in Plant Science, 7: 216346.
https://doi.org/10.3389/fpls.2016.01576
Tam V., Patel N., Turcotte M., Yohan B., Guillaume P., and David M., 2019, Benefits and limitations of genome-wide association studies, Nat. Rev. Genet., 20: 467-484.
https://doi.org/10.1038/s41576-019-0127-1
Tibbs C.L, Zhang Z., and Yu J., 2021, Status and prospects of genome-wide association studies in plants, Plant Genome, 14(1): e20077.
https://doi.org/10.1002/tpg2.20077
Wang M., Tu L., and Yuan D., 2019, Reference genome sequences of two cultivated allotetraploid cottons, Gossypium hirsutum and Gossypium barbadense, Nat Genet, 51: 224-229.
https://doi.org/10.1038/s41588-018-0282-x
Wang P., He S., and Sun G., 2021, Favorable pleiotropic loci for fiber yield and quality in upland cotton (Gossypium hirsutum), Sci. Rep., 11: 15935.
. PDF(375KB)
. HTML
Associated material
. Readers' comments
Other articles by authors
. Danyan Ding
Related articles
. Genome-wide association studies (GWAS)
. Cotton fiber quality
. Genetic improvement
. Multiomics integration
Tools
. Email to a friend
. Post a comment